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A  ROBUST  ESTIMATOR  OF  LOCATION  USING  AN  ADAPTIVE  SPLINE  MODEL 

by 

Dong  Yoon  Kim 


ABSTRACT 

This  paper  contains  a  new  approach  toward  the  robust 
estimation  of  a  location  parameter.  We  propose  NPS  (Normal 
Pareto  Spline)  distribution  which  provides  rough  fit  to 
density  functions  for  arbitrary  unimodal  symmetric  distri¬ 
butions.  The  bases  of  our  NPS  estimation  are  Pareto  tails 
and  spline  constraints .  Pareto  tails  can  represent  a 
diversity  of  tail  behavior,  and  spline  constraints  ensure 
the  smoothness  of  the  density  function. 

We  show  that  the  NPS  estimate  of  location  has  lower 
asymptotic  variance  than  Huber's  M-estimator  in  most  cases, 
regardless  of  how  Huber's  trimmed  constant  k  is  chosen. 

We  also  show  that  the  NPS  estimate  of  location  can 
guarantee  resistance  for  outliers. 

For  the  generalized  two  sample  location  problem,  where 
the  scale  parameters  are  unequal,  we  propose  an  iterative 
method  to  estimate  the  shift  parameter  and  also  have  a  proof 
that  this  iterative  method  converges  to  the  desired  M-estimate 
for  an  arbitrary  scale  location  family  of  symmetric  distributions. 

Some  Key  Words:  Pareto  Tails,  Spline,  Outlier,  Robust  Adaptive  Estimation 
AMS  1980  subject  classification.  Primary  62F35 
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Chapter  1 
Introduction 

Suppose  we  assume  for  our  underlying  statistical 
model  a  set  of  distributions  {P^},  9  e  E,  that  are  fairly 
well  specified,  and  then  in  terms  of  this  model,  find  a 
good  estimator  for  some  characteristics  of  the  true 
underlying  distribution.  If  the  true  distribution  of 
the  population  is  not  closely  approximated  by  one  of  the 
set  {P  },  0  £  E,  then  the  estimator  can  have  a  large 
error,  no  matter  how  large  the  sample  size.  To  safeguard 
against  this  danger,  we  need  a  robust  estimator. 

Our  approach  for  symmetric  distributions  is  to  use 
a  sufficiently  rich  class  of  distributions  to  approximate 
the  family  of  symmetric  distributions.  The  distributions 
in  our  class  will  be  called  Normal  Pareto  Spline  distributions . 
Given  i.i.d.  observations  x-,  ,  .  .  .  ,x  from  an  arbitrary 
symmetric  distribution  g,  we  shall  estimate  g  and  its 
characteristics  by  using  the  maximum  likelihood  estimate 
(MLE)  on  the  false  assumption  that  g  €  NPS .  The  MLE 

A 

gives  rise  to  9  and  Pg  .  This  estimator  will  be  called 
the  NPS  estimator.  Insofar  as  the  NPS  family  of  distri¬ 
butions  is  very  rich,  we  can  expect  that  there  will  be  a 

member  fQ  close  to  g,  and  the  NPS  estimator  will  be  close 
to  f^  and  therefore  to  g.  A  problem  with  evaluating 
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this  approach  arises  from  the  fact  that  it  is  not  obvious 
what  the  choice  for  the  closest  fQ  to  g  should  be. 

In  a  sense  we  are  moving  a  step  toward  nonparametric 
density  estimation  by  estimating  g  through  this  rich 
three-parameter  family  of  NPS  distributions.  In 
another  sense,  to  be  explained  later,  this  approach 
may  be  regarded  as  an  adaptive  generalization  of  a 
version  of  Huber's  M  estimator. 

In  chapter  2,  we  summarize  other  approaches,  both 
nonadaptive  and  adaptive,  to  robust  estimation.  In 
chapter  3,  we  define  the  NPS  distribution,  and  explore 
the  characteristics  of  the  NPS  family  of  distributions. 

In  chapter  4,  we  derive  asymptotic  properites  of  NPS 
estimates  and  summarize  simulation  results.  Also  we 
show  that  the  NPS  estimate  of  location  will  usually 
perform  better  than  Huber's  M-estimator.  In  chapter  5, 
we  discuss  several  variations  of  the ' two- sample  location 
problem  and  also  introduce  an  asymmetric  NPS  distribution. 
In  chapter  6,  we  explain  certain  computational  techniques 
used  in  this  dissertation,  including  a  simplex  method,  a 
Monte-Carlo  swindle,  and  a  variance  reduction  method  for 
the  logistic  distribution.  In  chapter  7,  we  summarize 

all  results.  In  the  appendix,  we  have  program  lists  for 
the  NPS  MLE , 
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Chapter  2 
Background 

While  model  building  is  certainly  desirable,  we 
know  in  practive  that  most  models  will  not  exactly  fit 
the  real  situation.  A  realistic  approach  seeks  statistical 
procedures  good  for  a  broad  class  of  possible  underlying 
models.  Such  statistical  procedures  are  called  robust. 

2.1.  Nonadaptive  estimators 

In  1964,  Huber  introduced  M-estimates,  which  are 
flexible  and  can  be  generalized  to  multiparameter  problems. 

Any  estimate  Tn  which  minimizes 
Ep(x^;Tn)  where  p  is  an  arbitrary  function  is 
called  an  M-estimate. 

Simplified  versions,  as  location  estimates,  involve 
p(x,T)  of  the  form  p(x-T)  for  some  function  p,  with 
p  ( 0 )  =  0,  p(x)  0  for  all  x.  Many  nonadaptive  robust 
estimates  are  M-estimates. 

As  examples  of  M-estimates; 

(i)  If  p(x)  =  x2,  the  corresponding  estimator  is 
the  sample  mean. 

(ii)  If  p(x)  =  | x |  the  corresponding  estimator  is 
the  sample  median. 
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(iii)  In  Huber's  (1964)  M-estimate 


( 


P(x)  =  J, 


ix2 

2X 


if  i x ]  <  k 


(2.1.1) 


k|x|  -  ^j-k2  if  | x |  >  k. 


and  the  corresponding  estimator  is  closely  related  to 
Winsorizing. 

(iv)  if 


C 


p(x)  = 


-i^-x2  if  |  x  |  <_  k 


jk2  if  | x |  >  k, 


the  corresponding  estimator  is  closely  related  to  a  trimmed 
mean . 

As  a  multiparameter  estimate,  the  choice 


p ( x ; 9 )  =  -  log  g(x;6) 


gives  the  ordinary  MLE  where  the  underlying  distribution 
is  g. 

Two  other  commonly  used  robust  estimators  are  the 
L  and  R  estimators.  An  estimate  is  an  L-estimator 
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(Linear  combination  of  order  statistics)  if  it  is  of  the 
form 


T 


n 


£winx(i) 


Ew . 
in 


1 


and  the  X(jj  are  order  statistics.  The  trimmed 

mean  corresponds  to 


w.  =  A(i/(n+l))  where 
in 


( 


(l-2o) 


A(t)  =  A 

0 


if  a  <  t  <  1  -  a 


if  t  <  a  or  t  >  1  -  a. 


An  estimator  is  an  R-estimator  if  it  is  of  the  form 


Tn  =  median  where  (j=l,...,n,  k=j,...,n) 

“jk  ’  Vtk-j)7^  idi-  di  -  °  £or  311  xjk  =  <x(j)+x(k)> 

The  Hodges -Lehmann  estimator  corresponds  to  d.=  ...  *  d  =  1. 

These  estimators  can  be  modified  a  bit  to  be  adaptive. 

For  example,  for  the  M-estimator,  one  may  replace 
p  (x)  by  p(x/s)  where  s  is  an  estimate  of  scale 
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based  on  the  data.  The  L  and  R  estimators  are  scale 
invariant  so  no  such  modification  is  needed  for  these 
estimators.  The  usual  meaning  of  "adaptive”  as  applied 
to  an  estimator  is  that  the  form  of  the  estimator  adopts 
according  to  the  shape  of  the  sample  distribution,  not 
merely  the  scale. 

2.2.  Adaptive  estimation 

In  1956,  Stein  published  a  paper  which  dealt  with 
the  problem  of  estimating  and  testing  hypotheses  about 
a  parameter  0 .  The  question  he  asked  was  "when  can  one 
estimate  '  0  as  well  as  asymptotically  not  knowing  the 
true  distribution  of  a  population  as  knowing  the  true 
distribution."  Stein  gave  a  simple  necessary  condition 
for  several  important  examples  and  he  indicated  a 
procedure  for  testing  whether  a  center  of  symmetry  has 
a  specified  value  that  should  work. 

/s 

Consider  estimators  ©n  of  the  location  parameter  0 
based  on  a  sample  (X^ , X- , • . • , Xn)  from  an  unknown 
distribution  G(x-0)  which  is  symmetric  about  the  origin, 
and  has  density  g(x-0).  We  can  divide  the  previous 
literature  on  adaptive  estimation  methods  into  two  main 


streams . 
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One  stream  of  research  on  adaptive  estimation  finds 

/N 

an  estimator  of  0  such  that  9n  is  asymptotically 

efficient  under  the  model.  I.e., 

1 

L  (n2  ( en-e )  )  -  N(0,T-1)  as  n  -  <=°  (2.2.1) 

where  I  denotes  the  Fisher  information  on  9  from 
the  distribution  G(x-0). 

Takeuchi  (1971)  considers  a  fictitious  random  sub¬ 
sample  of  size  k  drawn  from  the  original  sample  and 
constructs  the  best  linear  estimator  based  on  the  sub¬ 
sample.  Since  he  estimates  the  variance-covariance 
matrix  of  the  order  statistics  of  the  subsample,  this 
method  can  be  classed  as  an  adaptive  estimate. 

Stone  (1975)  takes  any  estimator  0n  of  0  which 

1 

satisfies  n^  (9”  -9)  =  0  (1)  as  n  -*■  00 .  By  using  a 

n  p 

nonparametric  estimate  of  L(x)  =  g'(x)/g(x),  he  imitates 

a  single  step  of  a  Newton-Raphson  iteration  solving 
n 

J  L(X-0n)  =  0  with  0^  as  the  initial  approximation. 

/s 

Since  L(x)  is  estimated  from  the  data,  0n  is  an 
adaptive  estimate  resembling  the  MLE . 

Van  Eeden  (1970)  and  Beran  (1974)  estimate 
4>(u,g)  =  -  g'  [G  ^(u)]/g[G  ^(u)]  which  provides  the 
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approximate  scores  for  the  best  linear  rank  statistic, 
using  a  window  scheme  and  a  Fourier  transform  respectively 
We  call  these  methods  adaptive  estimate  because  the 
<j>(u,g)'s  are  calculated  from  the  data. 

Beran  (1978)  estimates  the  density  from  the  data  in 
the  sense  of  nonparametric  minimum  Hellinger  distance. 
Using  the  estimated  density,  he  estimates  the  location 
parameter  0. 

Though  all  of  these  foregoing  methods  have  very 
desirable  mathematical  properties,  they  are  very  hard  to 
implement  and  require  many  calculations.  Also,  attainment 
of  their  asymptotic  behavior  seems  to  require  very  large 
sample  sizes. 

The  adaptive  estimation  literature  contains  a  second 
stream  of  papers  describing  methods  which  are  not  fully 
asymptotically  efficient,  but  which  are  easy  to  implement 
and  which  require  relatively  small  amounts  of  calculations 

Hogg  (1974)  (modified  by  De  Wet  and  Van  Qyk  (1979) , 
Harter  et  al.  (1979) )  uses  the  trimmed  mean  in  the  special 
symmetric  case  and  shows  how  to  select  the  amount  of 
trimming.  Since  these  methods  use  the  trimmed  mean, 
they  do  not  satisfy  the  condi tion ( 2 . 2 . 1)  of  asymptotic 
efficiency  except  in  the  rare  case  in  which  a  trimmed 
mean  is  asymptotically  efficient. 
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2.3.  Why  NPS  estimation? 

Beran's,  Stein's  and  similar  methods  have  excellent 
asymptotic  properties,  but  are  not  practical  for  everyday 
use.  Hogg's  method  is  easy  to  implement  but  it  has  less 
desirable  theoretical  properties.  NPS  estimation  lies 
between  these  approaches.  We  might  say  that  Hogg's  method 
is  discrete  and  restricted.  (Since  it  is  developed  by 
considering  only  a  few  possible  underlying  distributions) 
The  NPS  estimate  selects  from  a  continuous  range  of 
distribuitonal  shapes  measured  by  a  shape  parameter  y 
and  can  adapt  to  a  wide  range  of  sample  tail  behaviors. 

Except  for  Beran's  (1978)  method,  only  the  NPS 
estimate  suggests  the  rough  shape  of  the  underlying 
distribution . 


I.4  * 

Chapter  3 

NPS  (Normal  Pareto  Spline)  Distributions 

3.1.  Definition  of  NPS  distributions 

A  random  variable  X  has  standard  NPS  (Normal  Pareto 
Spline)  distribution  with  tail  parameter  y,  if  it  has  a 
density  of  the  form 


( 

2  , 
ax  +b 
e 

if 

1  x  l  <  1 

fQ (x,y) 

=  -1 

l 

1 

(3 

10c(  1  +c( 

1  x  l 

-i)fY“ 

-L 

if 

1  <  1  X  l  <  A, 

*  0 

If  y  > 

0 

then  A 

is 

00 ,  and 

if 

Y 

<  0  then  A 

=  1 

c 

Y 

If  y  = 

0, 

then  A  = 

00 

and 

f0 

(x'0)  =  10c  e 

-!< 

Jx  |-1) 

if 

1 

<  1  x  | 

(3 

The  parameters  a,b  and  c  depend  on  y,  and  are 
determined  by  the  requirement  that  Pr{|x|  >  l)  =  0.2 
and  the  spline  constraints  that  the  density  and  the  first 
derivative  of  the  density  are  continuous  everywhere. 

Thus  the  parameters  a,b  and  c  satisfy  the  following 
spline  equations : 


^|o 
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f0(l+)  ' 

=  f0(l")  (3.1.2) 

or 

a  +  b  = 

-  log  (10c) 

and 


f^(l+)  = 

=  f q ( 1  )  (3.1.3) 

or 


2ac  =  - 

(1+Y) 

and 

1  2 

/  eax  Ddx  =0.8  (3.1.4) 


-1 

In  addition. 

we  often  consider  the  family  of  variables 

Y  =  y  +  xX,  where  X  has  the  standard  NPS  distribution. 
Then  Y  has  the  NPS  distribution  NPS(y,x,y),  with  center 


at  y  and  interdecile  range  equal  to  2x.  An  illustration 
appears  in  Fig.  3.1.5. 
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Fig.  3.1.5  Schematic  illustration  of  the 

NPS  distribution  (density  shown  has  y  =  0) 

U-T  :  10  1  percentile 

Vi  :  50^  percentile  (median) 

x.  U, 

Vi+t  :  90^  percentile 

The  three  primary  parameters  are  u,  the  center  of  symmetry; 
t  the  scale  parameter;  and  y,  the  tail  shape  parameter. 

The  other  parameters  a,b,c  are  determined  implicitly  by  y, 
and  are  tabulated  in  Table  3.1.6. 
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Table 

3 .1.6  a,b,c 

as  functions 

of  y 

Y 

a 

b 

c 

-0.5 

-0.6410 

-0.7210 

0.3900 

• 

o 

1 

-0.7242 

-0.6971 

0.4142 

-0.3 

-0.7997 

-0.6766 

0.4377 

-0.2 

-0.8687 

-0.6583 

0.4604 

-0.1 

-0.9324 

-0.6417 

0.4826 

0 

-0.9915 

-0.6265 

0.5043 

0.1 

-1.0466 

-0.6126 

0.5255 

0.2 

-1.0983 

-0.5998 

0.5463 

0.3 

-1.1469 

-0.5878 

0.5667 

0.4 

-1.1927 

-0.5767 

0.5868 

0.5 

-1.2364 

-0.5663 

0.6066 

0.6 

-1.2778 

-0.5565 

0.6261 

0.7 

-1.3172 

-0.5473 

0.6453 

0.8 

-1.3549 

-0.5386 

0.6643 

0.9 

-1.3910 

-0.5303 

0.6830 

1.0 

-1.4256 

-0.5225 

0.7015 

1.1 

-1.4588 

-0.5150 

0.7198 

1.2 

-1.4907 

-0.5079 

0.7379 

1.3 

-1.5215 

-0.5011 

0.7558 

1.4 

-1.5513 

-0.4946 

0.7736 

1.5 

-1.5800 

-0.4883 

0.7912 
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The  central  portion  of  the  density  is  Gaussian  and 
the  tails  represent  a  reparametrization  of  Pareto  densities. 
(See  DuMouchel  (1983)). 

We  will  usually  restrict  ourselves  to  y  >  -0.5 
because  for  y  <_  -0.5,  f(x)  does  not  have  continuous 
derivatives  at  the  end  points  y  +  x *A. 

A  family  of  densities  which  are  Gaussian  in  the  middle 
and  have  a  variety  of  tail  behaviors  are  useful,  realistic 
models  for  many  kinds  of  data.  Having  heavy  tails  (y  larger 
than  0)  allows  us  to  model  outlier-prone  data,  since,  if 
X  has  an  NPS  distribution  with  y  >  0,  then  E{ |x|^}  =  00 
for  p  >  — . 

-  y 

3.2.  Tail  behavior  of  the  NPS  distribution. 

The  family  of  NPS  distributions  can  represent  a 
diversity  of  tail  behavior.  (See  Fig.  3.2.2).  At  y  =  0 
we  get  exponential  tails.  Anscombe  (1961)  mentions  that 
Generalized  Pareto  tails  with  y  >  0  can  be  generated 
by  gamma  mixtures  of  exponential  distributions.  When 
y  <  0,  the  distribution  is  truncated;  when  y  =  -  1, 
the  uniform  distribution  on  (y-1.25x,  y+1.25x)  results, 
while  y  =  0.5  leads  to  a  triangular  tail  behavior. 


Since 
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*  fQ(x,Y) 


if  1  <  x  <  A 


f0(x,Y) 


_i_2 

+  X(x-i) }  Y 

10c 


if  1  <  x  <  A 


and 


fQ (x,y) 


-  —  -  3 

(l+Y)  U+2Y)  ;x  +  X (x-i)  }  Y 
10c3  c 


if  1  <  x  <  A 


tail  behavior  is  as  described  in  Table  3.2.1,  Some  graphs 
of  NFS  distributions  are  presented  in  Fig.  3.2.2. 

Table  3.2.1  The  tail  behavior  of  NPS(0,1,y)  as 
a  function  of  y  where  1  <  x  <  A. 

Range  of  y  fl(xy  ) 

0  <  y 

-0.5  <  y  <  0 

Y  =  -0.5 
-1  <  Y  <  -0.5 

Y  =  -1  0 


f0(x,Y) 

+ 

+ 

0 


20 


0,2 


f(X) 

0.4 


f(X) 


0.6 


0+fl 


x 


OO  -s 


f(X) 


Fig.  3.2.2 


DIFFERENT  NPS  DISTRIBUTIONS 
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3.3.  Moments  of  the  NPS(0,l,y)  distribution 

Let  the  random  variable  X  have  a  standard  NPS 

1  k 

distribution.  If  y  <  E(X  )  exists.  Since  f  is 
symmetric  around  0, 

E{x^}  =0  if  k  is  odd,  and  y  < 

So  the  mean  is  0  (if  y  <  1) ,  and  the  skewness  is  0 
(if  Y  <  j). 

1  2  A 

Var(X)  =  EX^  =  J  x^*eaX  +^dx  +  2*/  x^'-r^j— {1  +  -^(x-1)  } 

1  IOC  c 

Integrating  by  parts ,  we  have 


1  o  2 

r  2  ax  +b, 

J  x  e  dx 

-1 


l-4c 
10a-  c 


.  1 

10c 


■{1  +  ~  ( x—  1 )  } 
c 


1 

Y 


1 

dx 


c  c 

To  “  5  (Y-l)  +  5(Y-1)  (2Y-1) 

j 

I 

00 


if  Y  <  \ 

if  y  -  \ 


and  finally 


Var (X) 


^-~^c  +  —  -  2c  - 2c -  if  y  <  — 

lOac  5  5  (y-l)  5  (y-l)  ( 2y-l)  T  2 

J, 


(3.3.1) 


OO 


I  • — ' 


22. 


EX4  =  J  x4  *eax  +bdx  +  2  •  J  x4-t^-{1  +  *(x-1) 
—  ^  2^  x  u  c  c 


i 

’  Y 


/1x4eax2+bdx  =  2a-3<1~4c>; 
-1  20a  c 


dx  = 


1 

dx ; 


2  3 

1  4c  12c  24c 

To  -10(Y-1)  +  10  (y-1)  (2y-l)  ■  10(Y-1)  (2y-l)  (3y-1) 

24c  • ^  y  ^  1 

+  1o(y-D  T7FT7 TTFT7  (4y-ll  l£  4 

“  if  Y  i  }! 


Finally 


pv4  _  2a-3 (l-4c)  ,1  4c  ,  12c2 

20a2c  +  5  ~  5 ( y-1 )  +  5 (y-1)  (2y-l) 

if  y  < 

24c2  24c4 

5 (y-1) (2y-l) (3y-l)  5 (y-1) (2y-l) (3y-l) (4y-l) 


Y  ; 


and  the  kurtosis  of  X  is  given  by 
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k  = 


EX 


{Var  x}‘ 


The  variance  and  kurtosis  are  tabulated  in  Table  3.3.3 
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3.4  Information  matrix  for  the  NPS(y,a,y)  distribution 

The  random  variable  Y  =  y  +  tX  has  density 
f(y,0)  =  By  definition,  the  information 

matrix  I  is  given  by 


I (H, x ,y) 


> 


We  restrict  ourselves  to  y  >  -0.5,  because  I  =  00 

-2 

for  y  £-0.5.  Since  I(y,T,y)  =  t  1(0, l,y),  we  evaluate 
I  for  u  =  0  and  t  =  1. 

I  y  I  <  i 

(3.4.1) 

1  <  |  y  |  <  A 

3  123-1  =  -  1  +  y  .l_l£S_£  (3.4.2) 

^  3  It 


9  log  f 

3y 


=  \ 


-2  ay 


1+Y 


c { 1  +  “^ (  |y  1-1)  ) 
c 


if 


*  sgn  (y)  if 


and 
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3  log  f 

5y 


3a  2 

3yy 


+ 


3b 
3  Y 


-  ^|^  +  Y  2log  (  1  +  £(  I  y  I  -1)  } 


if  I  y  I 


if  1  < 


_  (1+y"1) (lyl-l)  f  -l_v.  -2 

{i+X(lyl-D) 


Since  - — —  is  an  odd  function  of  y  and  - — 

and  - — —  are  even  ,1  =  I  =0  and 

3y  ux  uy 


I(y,x,y)  = 


'U,  U 


1 t / X  IT,Y 


Y  f  x 


Y  fY 


In  particular, 


= - 4-{  0 . 8a  + 


Y(l+Y) 


10c  (1+2y) 


-} 


1+Yr  4 _ _ Y_ 


2  5c  _  2'..,  \ 
x  5c  (1+2y) 


where  c  is  a  function  of  y  tabulated  in  Table  3.] 


<  1 

I y I  <a 

(3.4.3) 

f 


(3.4.4) 
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Table  3.4.1  Information  matrix 


Y 

I,  , 

I 

I 

I 

P,  P 

X  ,T 

x  ,Y 

Y,  Y 

-0.4 

2.557 

4.581 

3.015 

3.6 

-0.3 

1.828 

2.876 

1.581 

2.359 

-0.2 

1.642 

2.04 

0.764 

1.344 

-0.1 

1.588 

1.636 

0.4072 

0.8807 

0 

1.586 

1.403 

0.2215 

0.6224 

0.1 

1.608 

1.254 

0.1067 

0.4273 

0.2 

1.642 

1.151 

0.02767 

0.2766 

0.3 

1.683 

1.078 

-0.02421 

0.1785 

0.4 

1.728 

1.024 

-0.05766 

0.1182 

0.5 

1.774 

0.985 

-0.07814 

0.08141 

0.6 

1.822 

0.9562 

-0.09034 

0.0585 

0.7 

1.869 

0.9351 

-0.09738 

0.04387 

0.8 

1.917 

0.9197 

-0.1008 

0.03425 

0.9 

1.964 

0.9086 

-0.102 

0.02772 

1 

2.01 

0.9008 

-0.1015 

0.02314 

1.1 

2.055 

0.8955 

-0.1002 

0.01982 

1.2 

2.1 

0.8923 

-0.09823 

0.01733 

1.3 

2.144 

0.890.6 

-0.09583 

0.0154 

1.4 

2.186 

0.8902 

-0.09359 

0.01388 

1.5 

2.228 

0.8908 

-0.0909 

0.01263 

3.4.2 

Asymptotic 

standard 

deviations 

and  correlation 

coefficient  of  MLE 
matrix 

based  on  inverse  of  Informati 

Y 

CT, 

CT 

P 

CT 

P 

T 

x  /  Y 

Y 

-0.4 

0.6253 

0.6974 

-0.7424 

0.7867 

-0.3 

0.7397 

0.7418 

-0.6068 

0.819 

-0.2 

0.7805 

0.7891 

-0.4613 

0.972 

-0.1 

0.7934 

0.8312 

-0.3392 

1.133 

0 

0.794 

0.8689 

-0.237 

1.305 

0.1 

0.7886 

0.9025 

-0.1457 

1.546 

0.2 

0.7803 

0.9331 

-0.04904 

1.904 

0.3 

0.7708 

0.9648 

0.0552 

2.37 

0.4 

0.7608 

1.002 

0.1657 

2.949 

0.5 

0.7507 

1.048 

0.2759 

3.646 

0.6 

0.7409 

1.107 

0.382 

4.474 

0.7 

0.7314 

1.179 

0.4808 

5.445 

0.8 

0.7223 

1.267 

0.568 

6.565 

0.9 

0.7136 

1.37 

0.6428 

7.841 

1 

0.7054 

1.481 

0.7029 

9.242 

1.1 

0.6975 

1.604 

0.7523 

10.78 

1.2 

0.6901 

1.727 

0.79 

12.39 

1.3 

0.683 

1.843 

0.8183 

14.02 

1.4 

0.6763 

1.965 

0.842 

15.73 

1.5 

0.6699 

2.056 

0.857 

17.27 
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Chapter  4 

The  One  Sample  Location  Problem:  Estimation 

4.1.  Problem  description 

Let  '  *  *  * '^n  denote  a  random  sample  from  a 

continuous  population  with  symmetric  unimodal  distribution 
function  G(y-y).  In  this  chapter  we  shall  deal  with  the 
problem  of  estimating  the  location  parameter  y.  Our 
estimator  will  be  that  derived  from  computing  the  maximum 
likelihood  estimate  of  0  =  (y,x,y) '  based  on  the  assumption 
that  the  distribution  belongs  to  the  NPS  family.  We  shall 
call  this  the  NPS  estimate.  By  estimating  y  we  describe 
the  tail  behavior  of  the  distribution  g.  Inasmuch  as  this 
estimate  of  y  affects  our  procedure  for  estimating  the 
location  parameter  y ,  we  may  regard  the  NPS  estimate  of  y 
as  adaptive . 

For  large  samples  one  should  expect  the  NPS  estimate 
to  be  close  to  that  value  of  0  that  corresponds  to  the 
NPS  distribution  that  is  closest  to  G.  However  there  are 
several  notions  of  a  closest  distribution  which  may  be 
considered  and  we  shall  describe  three  in  the  next  section. 

4.2.  Three  concepts  of  closest  NPS  distribution 

The  first  concept  we  introduce  is  that  of  the 
NPS  distribution  with  the  same  median,  variance  and 
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90-th  percentile  as  G(x-y) .  Since  these  parameters  of 

2 

the  NPS  distributions  are  y ,  t  multiplied  by  Var  (X) 
derived  in  section  3.3,  and  y+T  respectively,  these 
matching  conditions  can  be  used  to  determine 
9m  =  (U,, , tw,yw)  ' .  This  concept  seems  to  be  rather  naive. 

It  is  based  on  some  arbitrary  choices  such  as  the  90-th 
percentile  and  is  unlikely  to  have  sound  theoretical 
justification.  Moreover  9..  is  not  defined  if  the 
variance  of  Y  is  infinite. 

A  second  concept  derives  from  the  fact  that  under 
suitable  regularity  conditions,  the  NPS  estimator  will 
converge  to  the  NPS  distribution  which  is  closest  in  the 
Kullback-Leibler  sense.  That  is,  we  select  9  to 

KXj 

minimize 

I  (gp ,  f  0)  =  lo9  tf  (yj  9)  ]  dy  (4.2.1) 

where  f g (y)  =  f ( y I  9 )  represents  the  density  of  the 
NPS (  y ,  t  ,  y )  distribution  and  g^  the  density  g(y-y) 
of  G(y-y) . 

This  concept  introduces  some  difficulties.  For  y  <  0, 
f  is  a  distribution  of  bounded  range  and  I(g^,fg)  =  00 
if  g^  has  infinite  support.  Thus,  even  though  for  some 
y  <  0,  f-  may  resemble  g,  very  closely,  fg  will  not  be 
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a  candidate *for  the  closest  NPS  distribution  to  g  .  In 
particular,  numerical  calculations  in  Table  4.2.2  demon¬ 
strate  that  the  closest  NPS  distribution  in  the 
Kulback-Leibler  sense  to  the  standard  normal  N(0,1) 
is  of  the  form  NPS(0,tkl,0)  with  =  1.2508.  Also 

the  closest  for  the  standard  logistic  is  NPS  (0 ,1^,0) 
with  x  =  2.171.  (See  Table  4.2.3) 

As  we  shall  see,  simulations  of  NPS  estimates  from 

A 

N(0,1)  data  yield  values  of  y  around  -0.2.  In  contrast 
to  0^. ,  our  first  relatively  naive  concept  gives 
(1VTM,Ym)  =  (0 , 1.282 ,-.  218)  as  the  parameter  of  the 
closest  NPS  estimator.  Thus  9„  seems  to  be  more 


M 


relevant  than  9 


KL ' 


Finally  we  introduce  a  third  concept.  Let  y^ 
t  h 

be  expected  i  order  statistics  among  n  samples  of 

G(y-u)  for  1  <_  i  <_  n.  The  closest  distribution  will 

be  NPS(y  , t  ,y  )  where  0  =  (y  ,  x  , y  )  is  the 
n'  n' 'n  n  Mn'  n' 'n 

NPS  estimate  based  on  the  synthetic  (nonrandom)  sample 

y^  n'V2  n'***'^n  n*  vectcr  ®n  be  called  the  synthetic 

parameter.  It  is  clear  that  ©n  depends  on  n  and 

one  would  expect  yn  to  converge  to  0  as  n  00  if 

ga  corresponds  to  N(0,1)  or  L(0,1).  However  we  shall 

see  that  for  moderately  large  n,  yn  will  be  about 
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Table  4.2.2 

max  / ( log  f ) *g  dx 

T 

for  various  y's 

when  g  v  N  (0 ,1) 

Y 

T 

/(log  f) 

0 

1.25078 

-1.43038 

0.001 

1.25078 

-1.43046 

0.002 

1.25097 

-1.43055 

0.003 

1.25097 

-1.43063 

0.005 

1.25097 

-1.43079 

0.01 

1.25117 

-1.43121 

0.02 

1.25175 

-1.43205 

0.03 

1.25234 

-1.43289 

0.04 

1.25312 

-1. 43375 

0.05 

1.25390 

-1.43461 

0.1 

1.25898 

-1.43899 

0.2 

1.27304 

-1.44789 

0.3 

1.29082 

-1.45664 

0.4 

1.31074 

-1. 46504 

0.5 

1.33222 

-1.47298 

Table  4.2.3 

max 

T 

/ (log  f )  *g  dx 

for  various  y's 

when 

g  ^  logistic 

(0,1) 

Y 

T 

/(log  f) 

0 

2.17101 

-2.00040 

0.001 

2.17082 

-2.00041 

0.002 

2.17042 

-2.00043 

0.003 

2.17023 

-2.00044 

0.00  5 

2.16964 

-2.00047 

0.01 

2.16828 

-2.00056 

0.02 

2.16593 

-2.00077 

0.03 

2.16398 

-2.00102 

0.04 

2.16222 

-2.00131 

0.05 

2.16066 

-2.00163 

0.1 

2.15675 

-2.00366 

0.2 

2.16164 

-2.00911 

0.3 

* 

2.17765 

-2.01553 

0.4 

2.20050 

-2.02230 

0.5 

2.22765 

-2.02913 

•gdx 


•  gdx 
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-0.2,  -0.05  for  the  N(0,1)  and  L(0,1)  respectively. 
In  Table  4.2.4  and  4.2.5,  we  list  the  synthetic  parameter 
0n  based  on  the  synthetic  sample  from  the  N(0,1)  and 
L(0,1)  respectively. 

Table  4.2.4  Synthetic  parameter  9n  based  on  N(0,1) 


n 

^n 

T 

n 

^n 

20 

0 

1.364 

-.761 

50 

0 

1.297 

-.391 

100 

0 

1.289 

-.305 

500 

0 

1.281 

-.227 

1000 

0 

1.280 

-.214 

Table  4. 

2.5 

Synthetic 

parameter  @n 

based  on  L(0 

n 

T 

n 

^n 

20 

0 

2.292 

-.481 

50 

0 

2.208 

-.187 

100 

0 

2.195 

-.113 

500 

0 

2.186 

-.054 

1000 

0 

2.184 

-.046 

We 

shall 

refer  to 

the  parameters 

of  the 

NPS 

distributions 

corresponding  to  these 

concepts 

as 

(1)  Variance  and  90-th  percentile  matching  or 
just  plain  matching ,  0^, 

(2)  closest  Kulback-Leibler ,  QKL,  and 

(3)  synthetic  NPS  parameter,  @n 

In  Table  4.2.6  we  list  three  special  distributions. 
These  are  the  standard  Gaussian,  Logistic  and  Slash,  distri¬ 


butions  . 
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Table  4.2.6  Standard  distributions 


Name 

Notation 

Density 

Support 

Gaussian 

N  ( 0 , 1) 

2 

X 

1 

— e 

/Tir 

-  00  <  X 

-X 

Logistic 

L  ( 0 , 1) 

u  -  e-X>2x2 

-  00  <  X 

Slash 

S(0,1) 

1  1-e"2" 

/2rf 

-  00  <  X 

In  Figures  4.2.8-4.2.10  we  present  the  densities 
of  the  closest  NPS  distributions  to  these  standard 
distributions  for  each  concept,  and  in  Table  4.2.7,  we 
tabulate  the  corresponding  values  of  0  =  (y,T,y)'. 


Table  4.2.6  9M'9KL'9n  for  N(0,1),  L(0,1),  S(0,1) 


For  N ( 0 , 1 ) 


For  L ( 0 , 1 


M 

'KL 


0 


1000 


0 


M 

>KL 

1000 


0 

0 

0 


0 

0 

0 


1.282 

1.251 

1.280 


2.197 

2.171 

2.184 


-0. 

-0. 

-0. 

-0. 


For  S(0,1) 


M 

5KL 


0 


1000 


0 

0 


3.335 

3.335 


not  available  because  Var[S(0,l)] 
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NPS  APPROXIMATIONS  FOR  SLASH  DISTRIBUTION 

-l  APPROXIMATION 
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When  the  underlying  distribution  is  g,  let  * 

be  the  result  from  concept  2  (closest  Kullback-Leibler) 
and  let  f(y;®iooO^  ke  t^le  reSu^-t  from  concept  3 
(synthetic  NPS  parameter,  where  n  =  1000)  of  defining 
the  "closest"  NPS  distribution  to  g.  We  are  especially 
interested  in  the  behaviors  of  ^qqo  ^or  vari°us  commonly 

used  distributions  g  which  have  infinite  support. 

Table  4.2.11  lists  some  possible  g's  for  various 
combinations  of  yKL's  and  YioOo's* 
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Table  4.2.11  Classification  of  values  of  YKL's  and  Y]_ooo's 
for  some  distributions  g  which  have  infinite 
support 


sign  of  ykl 


+ 

0 

— 

NPS(y  >  0) 

★  * 

Center  :  Normal 

Slash 

for  tail  :  Cauchy 

*  *  * 

Cauchy 

Center  :  NPS(y=0) 

NPS (Y=0) 

★  ★ 

for  tail  :  Cauchy 

★ 

Center  :  Normal 

Normal 

for  tail  :  Cauchy 

logistic 

*  *  * 

★ 

NPS(u,t,0)  (Normal)  center  from  minimum  expected  order 
statistics  to  maximum  expected  order  statistics  where 
n  =  1000,  with  Cauchy  for  tails. 

c(<.999) ,  c  of  Normal  center  and  (1-c)  of  Cauchy  far  tails. 

Y  <  0  is  impossible  if  g  has  infinite  support 
KLi 


★  *  ★ 
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4.3.  The  Huber  M-estimator 

We  shall  compare  the  NPS  estimator  with  Huber's 
M-estimator.  The  M-estimator  consists  of  selecting  p 
to  minimizing  £p(y^,p)  or  equivalently  to  set 
E'MyjyVO  =  0  where  ip  =  3p/3p.  One  typically  is  con¬ 
cerned  with  those  examples  where  p  and  ip  may  be 
written  in  the  form  p  (y-p)  and  iMy-p)  respectively. 
Then,  it  is  known  (Huber  (1964))  that  under  regularity 
conditions  on  the  symmetric  distribution  G(y-p),  the 
M-estimator  T^  is  consistent,  and  as  n  00 , 

L(vn(Tn-p))  ->-N(0,a2)  where 

a2  =  Lx)  *<?  (y)  dy 
T  [f'P  (y)  *g  (y)  dx] 


and 


By  Huber's  M-estimator  we  mean 


2/o 

y  /2 


if 


the  M-estimator  where 

I  y  I  <  k 


p (y)  =  • 


(4.3.1) 


^ I y I ”k2/2  if  iyl  >  k 


< P  (y) 


y 


k* sgn  (y) 


if  | y |  <  k 
if  | y |  >  k 


(4.3.2) 


Then  the  asymptotic  variance  is  given  by 


gjj(g,k>  =  £roin  ^2rk.2)?ty >dy  (4.3.3) 

[/  g(y)dy]2 

-k 

This  estimator  may  be  regarded  as  a  variation  of 
the  NPS  estimator  where  the  maximization  with  respect  to 
0  =  (y,T,y)'  is  carried  out  subject  to  the  restrictions 
x  =  k  and  y  =  0.  In  this  sense  the  NPS  estimator  is  an 
adaptive  generalization  of  Huber's  M-estimator,  where  the 
data  are  used  to  estimate  t  and  y.  As  we  pointed  out 
in  section  2.1,  it  is  not  uncommon  to  use  an  adaptive 
version  of  the  Huber  estimator  where  the  scale  parameter 
is  estimated. 

For  later  comparisons,  we  tabulate  the  asymptotic 
variance  of  Huber's  M-estimator  for  several  values  of  k 
for  normal, logistic  and  slash  distributions  in  Table  4.3.4. 


Table  4.3.4. 
Distribution 

k 

.5 

1 

1.5 
2 

2.5 

3 

3.5 

4 


Asymptotic 

variance  of 

Normal 

logistic 

1.2625 

3.4816 

1.1073 

3.1947 

1.0371 

3.0595 

1.0104 

3.0178 

1.0023 

3.0283 

1.0004 

3.0637 

1.0001 

3.1071 

1.0000 

3.1490 

Huber's  M-estimator 
slash 


5.6867 

5.4896 

5.5961 

5.9294 

6.4283 

7.0249 

7.6922 

8.3999 
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4.4.  Asymptotic  distribution  of  NPS  estimates 

If  G  has  infinite  support  and  the  closest  KL 
distribution  to  G(y-y)  has  y  >  0,  then  from  Huber  (1967) 
we  see  that  under  mild  regularity  conditions, 

i- [/n(en-eKL)  ]  -  N(0,E(6kl)  ) 

where  0^  is  the  NPS  estimator  based  on  a  sample  of  size  n 
1(0)  =  B-1AB-1,  (4.4 


A  =  A  (  0 ) 


log  f(Y,0)w3  log  f(Y,0),tl 

E[( - ^'5  — )( - £0 - ’  ] 


(4.4 


and 


B  =  B  (  0 )  =  -  E  (— - 9i-  ,  ( 

30 

It  should  be  noted  that  these  expectations  are  with 
respect  to  the  distribution  G. 

However  the  case  where  G  is  normal  and  y..-  =  0 
does  not  satisfy  the  regularity  conditions.  Indeed  our 
calculations  indicate  that  0^  approaches  0  so  slowly 
that  it  would  be  unreasonable  to  expect  the  limiting 
distribution  of  /n(0  -0VT )  to  be  normal.  Instead  we 


from  G, 

.1) 

.2) 

.3) 
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shall  present  a  heuristic  derivation  to  the  effect  that 

/n(0n-9n)  Is  asymptotically  normal.  This  derivation, 

which  follows,  involves  the  expansion  of  the  log  of  the 

likelihood  about  0=0  and  Y.  =  y.  where  Y.  are 

n  m  m  m 

the  order  statistics  and  y.  are  their  expected  values. 

in 

n  3  log  f (Y .  ,  §  ) 

0  “  X  - 5^ - 

1=1 

n  3  log  f(y.  ,0  )  A 

=  >  - r73 -  +  A. _  -  n(0  -0  )B.  +  higher  order  terms 

.  L .  do  in  n  n  m 

1=1 

(4.4.4) 

where 


m 


n 

=  l 

i=l 


log  f(yin,y 

30  3y 


<Yin-yin>  “ 


A.  ( 0  ) 
m  n 


(4.4.5) 


and 


B. 

m 


t  n 

-  I 

n  i£l 


!og  f(yin>en) 

3  02 


=  B (0  ) +0  (1)  =  B(0V_)+O  (1) 
n  p  KLi  p 


(4.4.6) 


and  the  sum  on  the  right  hand  side  of  (4.4.4)  vanishes. 
Thus 


/n(0  -0  ) 
n  n 


+  °P(1) 


(4.4.7) 


But  A. 

m 


is  a  linear  function  of  the  order  statistics. 
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and  by  the  Chemoff,  Gastwirth,  Johns  (1967)  theorem,  it 
is  asymptotically  normal.  To  be  more  specific,  given  a 
vector  function  H(y) ,  the  distribution  of 


Tn  * 

—  J  H  (y .  )  ( Y .  -y .  ) 

^  i=l  -  in  in  ln 

converges 

to  N(0,Z*)  where 

Z*  = 

cov  (C ( Y) ) 

and 


C  (y) 

=  /  H(y)dy 

In  our  particular  application 


H(y) 

32  log  f(y,0n) 

3y  3  9 

and 

C(Y) 

3  log  f(Y,0n) 

Thus 

L(/n(0n-0n))  =  N(0,Zin)  (4.4.8) 


where 
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Z,  =  b71a_  bT1 
In  In  2n  In 


64.4.9) 


and 


1  9  lo<?  f(irin'9n)  3  log  f(yin'9n} 

A2n  -  < - if  ~ >  < - a"-"'  -  >  ' 


(4.4.10) 


Since  0^  converges  slowly  to  ®KL/  we  tabulate  9 
and  the  elements  of  Z^n  for  various  values  of  n.  We 
include  the  limit  These  tabulations  appear  in 

Table  4.4.11  for  the  normal  and  logistic  where  yVT  =  0. 
For  the  slash  distribution  where  y^  >  0,  we  simply 
present  Z(0KL). 


Table  4.4.1  Various  asymptotic  variances  for 


Normal,  Logistic 

and  Slash  distributions 

ZKL  f°r 

normal 

ZKL  f°r 

logistic 

1.0647 

0 

0 

3.0116 

0 

0 

0 

0.8668 

.1290 

0 

3.3921 

-.1997 

0 

-.1290  0 

.2431 

0 

-.1997 

.4515 

where  0RL  = 

10,  1.251, 

0)  ' 

where 

CD 

* 

II 

O 

,2.171, 

ZKL  f°r 

slash 

5.375 

0 

0 

0 

20.4431 

.3318 

0 

-.3318 

.0582 

where  0„T  = 
KL 

(0,3.335,1. 

246)  ' 
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^1 , 1000  for  normal 
.9919  0  0 

0  .8869  .2364 

0  .2364  .4596 

where  ^]_^]_qqq  =  (0,1.280  ,-. 


Zl,1000  for  fistic 

3.0110  0  0 

0  3.3419  -.2763 

0  -.2763  .3872 

214)'  where  1  10Q0  =  (0,2.184, 


.046) • 
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4.5.  The  Normal-like  distribution 

We  describe  here  another  approach  to  analysing  the 
asymptotic  properties  of  the  NPS  estimator.  This  approach 
has  some  theoretical  shortcomings,  which  are  emphasized 
by  the  comparison  of  the  theory  with  the  simulations  for 
moderately  large  sample  size. 

A  major  theoretical  problem  has  been  that  the 
Kullback-Leibler  information  I(g,f)  =  00  for  g  with 
infinite  support  and  f  NPS  with  y  <  0.  Our  approach 
is  to  replace  g  when  it  has  finite  support  by  a  distri¬ 
bution  g£  which  "look  like"  g  over  most  of  its  range 
but  which  differs  in  the  far  tails,  in  that  it  has  finite 
support.  Since  g  is  close  to  g£  one  may  hope  that 
the  NPS  estimator  applied  to  g  would  have  similar 
properties  to  that  when  applied  to  g  .  Since  g  has 
finite  support,  the  difficulty  with  the  Kullback-Leibler 
information  will  be  alleviated.  For  the  theoretical 
comparison  using  the  "look  alike"  distribution  in  simu¬ 
lations,  we  need  to  go  through  the  following  steps. 

Suppose  t  is  defined  by 
t 

£  c  . 

/  g(y.dy  =  1  -  c/5  and  A£  =  Tnd  -  ~)  where  n  = 
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STEP  1  Define  g  as  follows 
-  3  £ 


ge(y)  = 


g(y) 


he(y) 


IyI  l  t 


fc£  <  IyI  <  Ae 


yI  >  A£ 


As  an  example,  we  will  take  g  to  be  the  normal 
distribution  and  we  call  g£  the  normal-like  distribution, 
indexed  by  the  parameter  e,  and  schematically  represented 
in  Fig.  4.5.1. 


Fig.  4.5.1.  Normal-like  density 
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Here, 


g  (y) 


<My)  where  4>(y) 


T 


/Ttt 


2 

y 

2 


e 


and 


h£(y)  =  (A£-Y)^{a£  +  b£(A£-y)  +  c£(A£-y)2} 

The  conditions  are  that  each  "far  tails"  have  probability 
0.2e  and  that  the  density  and  its  first  derivative  be 
continuous  at  t  and  A  .  Here  t  is  $ ( 1-0 . 2t )  . 

The  parameters  t  ,  A£ ,  a£,  b£  and  c£  are  tabulated 
in  Table  4.5.2. 

Table  4.5.2  Parameters  of  the  Normal-like  distribution 


£ 

t 

£ 

A£ 

a£ 

b£ 

c 

£ 

0.002 

3.353 

3.841 

0.0454 

0.1566 

0.1558 

0.001 

3.540 

4.011 

0.0251 

0.0899 

0.0930 

STEP 

2  Calculate  6Kl£ 

=  (0'TKL£ 

' YKL£} ’ ' 

the 

parameter  of  the  NPS  g  which  is  closest 


to  g£  in  the  Kullback-Leibler  sense. 


To  do  so  we  maximize 


48. 


/ (log  f ) ‘g£dy 


/  U(^)2  +  b 

-T 


log  t  }  •  <f>  (v)  dv 


+  2/  <j>  (y)  [log 

T 


1 

IOtc 


(1  +  h log  11  +  -  1) i) dy 

Y  CL 


=  +  b  -  log  t)  12$  (t)  -  1;  -  2-|<j>(t) 


+  2  log  Yjy—UCtg.)  -  <Kt)  +  . 2e ) 


t 

-  £ 

-  2(1  +  i)/  «My)log  11  +  -  1)  >]dy 

Yt  U  1 


e 


Table  4.5.3  lists 
various  values  of 


TKLe/  ^KLe  w^^-ch  were  computed  for 

£. 


Table  4.5.3  tkl£'  ^KL£  w^ich  minimizes  I(g£,f)  for 
various  £’s. 


£ 

TKLe 

YKL£ 

0.002 

1.309 

-.106 

0.001 

1.291 

-.127 

-  1)  >dy 
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STEP  3  Calculate  asymptotic  variance  of  when 

underlying  distribution  is  g£. 

From  Huber  (1967)  we  see  that  for  very  large  samples 
from  g  ,  the  asymptotic  distribution  of  the  NPS  estimator 

A 

0„_  satisfies 
n  £ 


L  ( / n  ( 9 

ns 


w* 


N(0,Ze) 


(4.5.4) 


where 


I  =  B_1A  B 

£  £  £  £ 


(4.5.5) 


3  log  f(Y  ,  0  .  )  3  log  f (Y  , 0  ) 

\  =  E[( - aT'-—  < - -9  -  —  >’] 


(4.5.6) 


and 


B 

£ 


-  - 

3  9 


(4.5.7) 


and  these  expectations  are  with  respect  to  the  distribution 
g£  of  Y  .  In  Table  4.5.8  the  asymptotic  variance  of 


are  tabulated. 
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Table  4.5.8 

The  asymptotic  variance 

underlying  distribution 

of  ^NPS  When 
is  g£ 

e 

n 

A  *  V  *  of  UNPS 

.002 

500 

1.064 

.001 

1000 

1.045 

In  simulations  which  use  samples  of  n  =  1000  drawn 
from  a  normal  population,  the  variance  of  is  1.013. 

The  asymptotic  theory  of  the  Normal-like  distribution 
slightly  over  estimated  the  asymptotic  variance.  But 
if  we  try  various  kinds  of  constant  multiply  by  e  as 
the  tail  area,  then  we  will  get  better  approximation. 

4.6.  Sensitivity  and  Influence  curves 

The  study  of  robustness  involves  consideration  of 

sensitivity  to  outliers.  The  sensitivity  curve  of  the 

estimator  T  is  defined  bv 
n 

sc (y?y1,y2, . . . ,yn,Tn)  = 

(n+1) (Tn+1 (y1 , . . . ,yn,y)  -  Tn (y1 , . . . , yR) }  (4.6.1) 

where  the  y^'s  are  the  observations.  The  sensitivity 
curve  describes  the  effect  of  an  additional  observation 
at  y.  An  estimator  with  a  high  resistance  to  outliers 
will  have  a  low  sensitivity  for  outlying  values  of  y. 
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For  the  estimator  T  =  y  ,  S C  =  y  -  y  which  becomes 

n  n  n 

large  as  v  +  00 . 

This  curve  is  inconvenient  to  use  because  it  depends 
not  only  on  y  and  T  but  also  on  the  observed  data 
y1 ,y2 , . . . ,y  .  One  wav  of  avoiding  this  difficulty  is 
the  use  of  the  influence  curve.  If  the  estimator  T 

n 

can  be  expressed  as  a  functional  of  the  empirical 
distribution  Gn,  i.e.  Tn  =  T(Gn),  then  Hampel  (1974) 
introduced  the  influence  curve 

IC  (y ;  G ,  T)  =  lim[Tll-e)G  +  e*6y}  -  T{G>]/£  (4.6.2) 

e-*0 

where  5y  represent  the  distribution  which  assigns 
probability  one  to  the  point  y.  For  the  estimator 
T  =  y  ,  T(G)  =  u  and  IC  =  y  -  jj  .  For  any  M-estimator 
it  follows  that  (see  Huber  (1981)) 

IC  (y ;  0 )  =  cip  (y ;  0 ) 

where  c  is  constant  and  ip(y;6)  =  3p(y;0)/8y.  Our 
NPS  estimator  may  be  regarded  as  an  adaptive  M-estimator 
where  the  form  of  ip  is  data  dependent  and  the  above 
result  is  not  sufficient  to  make  the  use  of  the  influence 


curve  convenient  for  our  estimator. 
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We  choose  instead  to  rely  on  the  use  of  the  synthetic 
example  xin'***'xnn  w^ere  xj_n  i s  the  i/(n+l)  fractile 
of  G.  The  resulting  curves  will  be  called  a  stylized 
sensitivity  curve  (SSC) . 

In  Table  4.6.3,  we  present  a  qualitative  description 
of  the  stylized  sensitivity  curve  based  on  Gaussian  G. 

In  Figures  4. 6. 4-4. 6. 6  these  curves  are  graphed  for  the 
Gaussian,  Logistic  and  Slash  distributions  respectively. 

In  each  of  these  cases  the  SCC  is  bounded.  This  is  an 
anticipated  consequence  of  the  following  heuristic 
argument.  First,  as  y  *►  +  00 ,  the  estimate  y  of  the 
tail  thick  parameter  or  x  the  scale  parameter  must 

A 

get  large.  But  x  is  pretty  much  constrained  by  the 
implicit  requirement  that  most  of  the  observations  should 
lie  between  p  +  x.  Explicitly  the  term  -  log  x  which 
occurs  n+1  times  in  the  likelihood  based  on  the  sample 
y,  x  , • • • 'xnn'  ^ee  s  x  from  growing  too  fast.  If  now 
we  treat  the  NFS  estimator  as  an  M-estimator  with  p(y-y) 

✓  »  ^  A  A  \ 

replaced  by  -  log  x  f(y-p;0,x,y)  as  though  x  and  y 
were  fixed,  the  sensitivity  would  be  reflected  by  the 
corresponding 

*(y>  =  Jitlls&L  M -  for  |y|  >  ? 

CT{1  +  I(Z  -  1)} 

C  T 


-20 
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(N+1)x(T(X1.X2 . Xn,X)-T(X  1  ,X2 . Xn)) 

-4  0  4 


Fig .  4.6.4 


SENSITIVITY  CURVE  (NORMAL  CASE) 


-20 
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(N+  1  )x(T(X1  ,X2 . Xn,X)-T(X  1  ,X2 . Xn)) 


Fig .  4.6.5 


SENSITIVITY  CURVE  (LOGISTIC  CASE) 
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(N+1)x(T(X1,X2,...,Xn,X)-T(X1,X2,...,Xn)) 


Fig .  4.6.6 


SENSITIVITY  CURVE  (SLASH  CASE) 
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/>  A 

as  y  -*■  °°,  and  y  °°,  this  quantity  behaves  like  (y-x) 
which  approaches  0. 

For  the  sensitivity  curves  in  Fig.  4.6.4  -  4.6.5 
(Normal  or  Logistic  case)  y  is  negative  where  y  =  0. 


/\  A 


Since 

y  for  n 

=  19 

reaches 

0  much 

faster 

than  y 

for 

n  =  999,  so 

in  the 

n  =  19 

case  we 

have 

an  earlier 

peak 

point  than 

in  the 

n  =  999 

case. 

For  the  sensitivity  curve  in  the  slash  case,  y  is 
positive  where  y  =  0,  so  the  peak  point  occurs  at  x. 

A  /V 

When  y  is  large,  y  for  n  =  999  is  larger  than  y 
for  n  =  19,  so  the  sensitivity  curve  for  n  =  999  drops 
down  faster  than  the  sensitivity  curve  for  n  =  19. 

4.7.  Simulation  results 

Simulations  were  carried  out  to  determine  the  sampling 
properties  of  the  NPS  estimator  for  finite  samples  from 
the  Normal,  Logistic  and  Slash  distributions.  We  present 
the  variance  of  y  (relative  efficiency  compared  with 
M.L.E.)  and  standard  deviation  of  variance  of  y  based 
on  sampel  of  sizes  20,  100,  1000  in  Table  4.7.1.  Also 
for  the  comparison  purpose,  we  present  the  results  of 
further  simulations  using  two  adaptive  trimmed  means  for 
comparisons  with  the  NPS  estimator.  The  two  adaptively 
trimmed  means  will  be  denoted  JBT  and  WHD .  The  JBT  method 


57. 


Table  4.6.3  Behavior  of  the  S.C.  for  the  NPS  estimate  where 
underlying  distribution  is  Gaussian 


Range 

Behavior 

v  I 

>1 

v  1 

o 

A 

Almost  linear,  a  does  not  vary  much 

(since  ’My)  =  ~  2a*y  if  |y|  <  x) 

T  <  y  <_  T* 

As 

Curved  upward,  since  y  <  0  My)  goes 

up . 

X*  <  y  <  x** 

/V 

Curved  downwards,  because  y  increases 

and  approaches  to  0.  We  have  range 

expansion  for  My) 

X  *  *  <  y 

As 

Asymptotically  goes  to  0,  because  y  =  0 

at  x**  and  goes  up.  We  have  infinite 

range  for  My) 
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uses  either  the  8%  trimmed  mean  or  the  25%  trimmed  mean, 
the  choice  based  on  whichever  has  the  smaller  estimated 
standard  error  based  on  variance  calculations  on  the 
particular  sample.  The  JBT  estimate  is  proposed  by 
John  Tukey  and,  as  described  in  Andrews  et  Al .  (1972) 

is  simple,  robust  and  performs  relatively  well.  The 
WHD  method  is  simpler  and  chooses  between  the  ordinary 
mean  and  the  25%  trimmed  mean  based  on  same  criteria, 
proposed  by  William  DuMouchel. 

Table  4.7.2  presents  the  results  from  500  replications 
with  various  variance  reduction  methods  which  will  be 
described  in  sections  6.2  and  6,3. 

Although  these  location  estimates  of  y  may  not  be 
affected  by  outliers,  we  also  present  probability  plot  in 
Fig.  4.7.3  -  Fig.  4.7.5,  so  that  we  can  examine  the  outliers. 

Since  I  is  given  by  I  (1—^3—^-)  2f  dx  ,  for 

M  P  M  o  X 

the  logistic  distribution, 


I 


y ,  y 


lo9 


-x 

— ! £ - _}• 

( l+e~x) ^ 


-x 


(l+e-x) 2 


dx 


1 

3' 


so  the  Cramer  Rao  bound  for  estimating  the  location  parameter 
of  a  logistic  distribution  is  3.  For  the  slash,  by 
numerical  integration,  we  get  4.847  as  the  asymptotic 
variance  of  an  efficient  location  estimator. 
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Table  4.7.1  Product  of  sample  size  and  variance  of  the 

A 

location  estimate  y,  relative  efficiency*, 
and  standard  deviation  of  variation  of  y. 
Distribution  Normal  Logistic  Slash 

sample 
size  (n) 
n  x  v  1000 

efficiency* 
n  x  v  100 

efficiency* 
n  x  v  20 

efficiency* 


1.013+.001**  3.006+.074***  5.240+.139** 
.987  .998  .925 

1 . 067+ . 005  3 . 065+ . 080  5. 698+. 196 

.937  .979  .851 

1 . 086+ . 006  3 . 3  30+ . 12  4  6.711+.380 

.921  .901  .722 


*  compared  with  MLE 

**  from  formula  6.2.1 

***  from  formula  6.3.4 


compared  with  NPS  estimator  from  simulation  results 
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Table  4.7.2  Product  of  sample  size  and  variance  of  the  location  estimate  using  JBT(WHD), 


We  conclude  that  for  most  cases  NPS  estimation  is 


better  than  or  about  as  efficient  as  the  JBT  and  WHO 
estimators  when  the  sample  size  n  is  greater  than  100, 
When  n  =  20,  the  WHD  method  performed  slightly  better 
in  these  simulations,  with  NPS  and  JBT  about  equal. 

If  we  compare  our  simulation  results  for  n  =  1000 
with  the  asymptotic  variance  of  Huber's  M-estimate 
(see  Table  4.3.4) ,  we  can  say  that  regardless  of  how 
Huber's  trimmed  constant  k  is  chosen,  in  most  cases 
NPS  is  better  than  Huber's  M-estimator.  If  the  tail 
behavior  of  the  distribution  generating  data  is  far 
from  exponential,  then  the  NPS  estimator  is  always 
more  efficient  than  Huber's  M-estimator. 
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Chapter  5 

The  Two  Sample  location  Problem  and  the  Asymmetric  Model. 
5.1.  Problem  description 

One  of  the  fundamental  problems  of  statistics,  often 
encountered  in  applications,  is  the  two  sample  location 
problem.  Let  G(x)  be  a  symmetric  distribution  and  let 

^l'^2'***'^m  an<^  Z1 '  z2  '  *  *  *  '  zn  i-n<^ePen<^ent  random 

samples  from  G(y-u^)  and  GCy-v^)  respectively.  It 
is  desired  to  estimate  6  =  0ne  way  to  proceed 

is  to  estimate  <5  by  the  difference  of  two  separate 
NPS  estimations  of  and  \i  ,  thereby  ignoring  the 

fact  that  Y  and  Z  have  common  distribution  except 
for  location.  A  natural  alternative  of  course,  is  to 
extend  the  concept  of  NPS  estimation  to  this  problem  by 
applying  the  method  of  maximum  likelihood  to  the  model 
where  the  Y's  and  Z's  have  common  scale  and  shape 
parameters.  That  is,  we  act  as  though  the  Y^  come 
from  NPS(u^,T,y)  and  the  Z_.  from  NPS(P2>T/Y)- 

Another  alternative  is  to  pool  the  estimate  of  y 

but  to  permit  the  use  of  separate  x's.  This  would  be 

most  appropriate  for  problems  where  one  anticipates  the 

possibility  of  different  scale  parameters  but  common  tail 

y-u-L 

behavior,  i.e.  distribution  of  the  form  G(— ^ — )  and 
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y-u2 

G(--- — •)  .  Then  we  apply  MLE  to  NPS(u^,t^,y)  for 
and  NPS(U2'T2'Y^  for  z-j  • 

5.2.  The  computation  for  the  two  sample  model  with 
possibly  different  variances. 

We  present  an  iterative  method  of  estimating  the 
parameters  for  the  two  sample  problem  with  common  y  and 
possibly  different  scale  parameters. 


STEP  1  Calculate  the  NPS  estimator  for  each  sample 
separately,  yielding  ’  and 

0  =  (U_,x  ,y  )  ’  respectively. 

W  M  Z  Z 

STEP  2  Let 


v . 

i 


and 


1.  1  j  2  f  m  •  m  f  IT-1 


v 


m+ j 


j  =  1,2, ...  ,n 


Based  on  the  sample  {v^  :  1  <_  i  <_  m+n}  apply  MLE 
to  the  model  NPS(0,1,yv)  to  estimate  y  . 


STEP  3  Based  on  the  sample  {y^  :  1  <  i  <  m},  apply 

A  A 

MLE  to  the  model  MPS(u,t,yv)  with  yv  determined  in 
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STEP  2  to  revise  y  ,  x  .  Revise  y  .t  similarly.  If 

y  y  2  2 

y  ,  x  ,  y  and  x  are  all  sufficiently  close  to  the 
y  y  z  z 

previous  values,  stop  the  iteration.  Otherwise  return 
to  STEP  2. 

Proposition  5.2.1  The  iterative  method  leads  to  a 
sequence  of  estimates  which  converge  to  a  local  maximum 
of  the  MPS  likelihood  function. 

To  prove  this  proposition  we  introduce  some  notation. 
Let  L(y;0)  be  the  NPS  likelihood  based  on  the  sample 
Y_  ~  ^1^2'  *  *  •  'Ym)  •  Let  i  represent  a  vector  all  of 
whose  elements  are  unity.  Denote  the  i-th  estimate  of  the 
parameters  V  /.../Yv  hy  y^ , . . . ,Y^V>  with  corresponding 
values  for  v.  =  (xT1(y-y.  1),  xT1(z-y.  1)). 

First,  we  observe  that 

L(y;;y,T,y)  =  x”mL  (T_1y  ;x_1y  ,1  ,y) 

My;y, t,y)  =  l  (y-yl;0  ,x  ,y) 

and  consequently 


L(y;y,T,Y)  =  x_mL ( x"1 (y-yl)  ; 0 , 1  ,  Y)  . 


After  completing  STEP  2,  our  combined  likelihood  is 
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given  by 


L 


1 


< 


< 


TlyTlzL (—1 ' 0 ' Ylv^  =  L(I?yly'Tly 
L(I'^2y,T2y,Ylv)  * L  y2  z '  X2  z '  Ylv) 
T2yT2zL^-2;0 ,1,y2v^  =  L^;y2y'T2y 


,Ylv) *L(-;ylz,Tlz,Ylv) 


-m  * ",ri—  /  a  i  \ 

■  T2y  *  T2zL  — 2 ' 0 ' Ylv 


'y2v) 'L<-;u2z'T2z’y2v) 


Thus  the  likelihood  function  is  nondecreasing  after  each 
pair  of  STEP  3  and  STEP  2.  If  the  parameters  change  at 
a  step  only  if  the  likelihood  increases ,  this  procedure 
leads  to  a  monotonic  increasing  likel  ihood  unless  the 
process  stops.  The  process  can  not  yield  a  limit  point 
which  is  not  a  local  maximum  for  the  gradient  is  not 
zero  at  such  a  point,  and  one  of  the  two  steps  will  lead 
to  a  substantial  increase  in  the  likelihood,  once  we  are 
in  the  neighborhood  of  such  a  point.  Thus  the  process 
must  lead  to  a  local  maximum  or  an  unbounded  sequence. 

But  it  is  easy  to  see  that  the  likelihood  approaches  0 
for  a  sequence  which  is  unbounded  in  the  parameter  space. 
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5.3.  The  asymmetric  model. 

We  have  confined  attention  to  the  location  parameter 
for  symmetric  distributions.  The  estimation  of  location 
for  an  unspecified  asymmetric  distribution  is  not  a  well 
defined  problem.  On  the  other  hand,  from  the  point  of 
view  of  estimating  densities,  we  may  pose  the  problem  of 
using  an  asymmetric  version  of  the  NPS  model  to  approximate 
unimodal  distributions  and  to  estimate  these  distributions 
by  estimating  the  parameters  of  that  model.  We  shall  say 
that  a  random  variable  X  has  the  standard  asymmetric 
NPS  distribution  with  skewness  parameter  s,  left  tail 
parameter  YT,-  and  right  tail  parameter  Yt,/  if  it  has 
a  density  of  the  form 


TOd 


1 


-s  <x<0 


e 


f0<X'S'YL'V  =  ■ 


(5.3.1) 


e 


0<x<l 


1<X<A2,  yr7*  0 
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If 

0 

then 

=  ®,  and  if 

yL  " 

0  then 

A1 

If 

0 

then  A2 

=  00 ,  and  if 

yR  < 

0  then 

A2 

If 

= 

0 

and  x  < 

-s  then  A^ 

s  00 

and 

_i_(_  *  . 

i  dT  V  s 

-  1) 

fo(x,s,0,YR)  = 

Li 


(5.3.1') 


If 


Yr  =  0 


and  x  >  1  then  A-  =  00  and 


f0  (x/s/YL/°)  ~  1^3" 


-  3— (x-1) 
.  dR 


(5.3.1") 


The  parameters  dL,  dR,  aL,  aR,  bL,  bR  and  c  depend  on  s, 
Yl  and  yR,  and  are  determined  by  the  requirement  that 
Pr{-s  <_  x  <  0 }  =  Pr(0  £  x  <  1}  =  0.4,  and  the  spline 
constraints  so  that  the  density  and  the  first  derivative 
of  the  density  are  continuous  everywhere. 

The  parameters  d^,  dR,  aL,  aR,  b^,  bR  and  c 
satisfy  the  following  spline  equations: 


fQ(-s+)  =  fQ(-s") 
f0-  (-s+)  =  fj(-s”) 
fj(0+)  =  f'(O-) 


(5.3.2) 

(5.3.3) 

(5.3.4) 


(5.3.5) 
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(5.3.6) 


(5.3.7) 


(5.3.8) 


0 

In  addition,  we  also  consider  the  family  of  the  variables 
Y  =  y  +  xX  where  X  has  the  standard  asymmetric  NPS 
distribution,  and  Y  has  the  asymmetric  NPS  distribution 
ANPS (y,x,s ,Yl,Yr) ,  with  median  at  y  and  interdecile 
range  equal  to  t(1+s). 

(See  Fig. 


J_ L 


Fig.  5.3.9.  Schematic  asymmetric  NPS  model  density  shown 
has  s=2,  yl=0,yr=0 


where  y-sx  :  10 


th 


percentile 


:  50 


th 


percentile  (median) 


y 


y+x  :  90 


th 


percentile 
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This  is  a  5-parameter  family.  The  five  primary  parameters 
are  y ,  the  location  parameter;  T ,  the  scale  parameter; 
s,  the  skewness  parameter;  the  left  tail  shape  parameter; 

and  y  the  right  tail  shape  parameter.  The  other 
parameters  dT  ,  dR,  aL,  aR,  bL,  bR  and  c  are  defined 
implicitly  by  s ,  yl  and  yR, 

Fig.  5.3.10  presents  some  variations  of  the  asymmetric 
NPS  model.  In  Fig.  5.3.10(a),  the  3  densities 
are  all  symmetric.  One  has  exponential  tails  (yT  =  y„  =  0)  , 
and  for  comparison  the  other  two  have  thicker  tails 
(yL  =  yR  =0.3)  or  thinner  tails  (yl  =  Yr  =  -0.3).  In 
Fig.  5.3.10(b),  all  3  asymmetric  NPS  densities  have 
exponential  tails.  One  is  symmetric  (s=l)  and  for  comparison 
the  other  two  are  skewed  to  the  right  (s=2)  or  skewed  to  the 
left  (s=0.5).  In  Fig.  5.3.10(c),  all  3  asymmetric  NPS 
densities  are  skewed  to  the  right,  and  have  exponential 
tails.  One  has  standard  scale  parameter  (t=1)  and  for 
comparison  the  other  two  have  larger  scale  (t=2)  or  smaller 
scale  (t=0.5).  In  Fig.  5.3.10(d),  all  3  asymmetric 
NPS  densities  have  the  same  scale  and  a  skewness  parameter 
of  s  =  2.  One  has  exponential  tails  and  for  comparison 
the  other  two  have  a  thicker  right  tail  (yd=0.5)  or  a 
thinner  left  tail  (l=-0.5). 
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Fig.  5.3.10 


DIFFERENT  ASYMMETRIC  NPS  DISTRIBUTION 
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Chapter  6 

Computational  Problems 

The  calculation  of  the  NPS  estimates  and  the 
simulations  that  were  carried  out  required  extensive 
computations.  In  this  chapter  we  discuss  three  methods 
which  were  applied  to  reduce  the  computer  time  used  or 
the  programming  difficulty. 

6.1.  The  computation  of  the  NPS  estimates 

The  NPS  estimates  require  a  maximization  subject 
to  the  spline  constraints  (3.1.2)  to  (3.1.4).  First 
numerical  calculations  were  performed  to  construct  an 
accurate  table  representing  a,  b  and  c  as  functions 
of  y.  For  the  later  calculations,  interpolation  on 
that  table  was  performed.  A  portion  of  that  table  is 
presented  in  Table  3.1.6. 

The  next  part  was  that  of  maximizing  the  likelihood. 

For  this  a  simplex  method  developed  by  Nelder  and  Mead  (1965) 
was  applied.  This  method  is  a  direct  search  procedure 
and  is  not  related  to  the  simplex  method  of  linear  programming. 
It  has  the  following  advantageous  properties. 

(i)  No  assumptions  are  made  about  the  surface  except 
that  it  is  continuous  and  has  a  unique  maximum 
in  the  area  of  search. 
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(ii)  This  method  does  not  involve  any  derivatives, 
(iii)  This  method  is  effective  and  computationally 
compact . 

The  details  of  the  simplex  method  are  as  follows.  Suppose 

the  problem  is  to  find  the  maximum  of  some  function 

f (x^/X2 / • • • /X  ) .  Since  f  has  p  variables,  we  need 

to  evaluate  f  at  (p+1)  trial  values  of  x,  denoted  here 

by  Aq,A^,...,A  .  Assume  that  these  points  lie  on  the 

vertices  of  an  irregular  simplex  in  (p+1)  space  and  that 

f ( Aq )  =  min  f (A^) .  In  this  case,  a  reflection  is  made 

through  the  point  C  (centroid  of  face  opposite  Aq)  to  a 

point  B  where  B  =  Aq  +  2(C-Aq) .  One  version  of  the 

simplex  method  consists  of  replacing  Aq  by  B,  relabeling 

the  points  so  that  f(AQ)  is  the  smallest,  and  repeating 

the  process  of  replacing  the  worst  point  by  its  reflection  B. 

A  more  sophisticated  version  of  the  algorithm  was  actually 

used,  in  which  Aq  is  replaced  by  D,  where  D  is  of  the 

form  Aq  +  d(C-AQ) .  One  of  four  possible  values  of  d  are 

used,  depending  on  the  relationship  of  f(B)  to 

f (Aq) , . . . , f (A  ).  (Fig.  6. 1.1-6. 1.4  shows  p=2). 

u  p 

Case  1.  If  f(B)  >  max  (f (A, ),..., f (An) ) ,  then  an 
extension  is  made  where  d  =  3.  (See  Fig.  6.1.1) 
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Fig .  6.1.1 

(Dashed  lines  show  contours  of  f) 
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Case  2.  If  f(B)  <  f(Ag),  then  a  contraction  is 
made  where  d  =  0.5  (See  Fig.  6.1.2) 


Fig .  6.1.2 

Case  3.  If  f(AQ)  <  f(B)  <  min  ( f  (A^  ,  . .  .  ,  f  ( A  )  )  , 
then  a  contraction  is  made  where  d  =  1.5  (See  Fig.  6.1.3) 


Fig.  6.1.3 
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Case  4.  If 

max  ( f (A. ) , . . . , f (A  ) )  <  f (B)  <  min  ( f ( A, )  (A  ) )  • 

^  ST  X 

then  a  reflection  is  made  where  d  =  2  (i.e.  D  =  B) . 

(See  Fig.  6.1.4) 


Fig.  6.1.4 


6.2.  Swindle 

Whenever  we  try  to  do  experimental  sampling  in  a 
computer  simulation,  it  is  wise  to  try  to  find  a  restate¬ 
ment  of  the  problem  which  reduces  the  amount  of  computation 
required  to  achieve  the  desired  precision  in  results.  The 
restatement  we  consider  here  is  called  a  Monte  Carlo 
swindle,  (See  Gross  (1973)).  Let  (x^,  i  =  l,2,...,n) 
be  a' sample  of  size  n  from  some  symmetric  distribution  G 
of  a  random  variable  x  which  has  the  form  x  =  u/v 
where  u  is  N(0,1)  and  v  is  independent  and  positive. 
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A  number  of  important  distributions  (Student's  t,  Cauchy, 
double  exponential,  slash)  are  of  that  form.  If  v  =  1, 

G  is  N(0,1),  and  if  v  has  the  Uniform  (0,1)  distribution 
x  has  the  slash  distribution. 

u . 

Our  observations  thus  are  x.  =  —■  .  Given 

i 

vi ,  x.  'V'  N(0,v^)  .  Let 


~  2  2 
x  =  Ex.v.  /Eu.  , 

li  i 


and 


2 

s 


V  . 


Also,  let 


Ci  =  i  =  1,2,. ..,n 

represent  the  elements  of  the  configuration  vector  c. 

A 

It  is  known  that  c,  s  and  x  are  conditionally  independent 
given  v.  Moreover  the  conditional  distributions  of  x 
and  s^  are  N  (0 ,  ( Ev^)  “■*■)  ancj  that  of  x^_j_/(n-l). 

Now  let  Tn  be  a  symmetric,  scale  and  location  invariant, 
statistics  of  the  sample  x.  Then 
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Tn[{-  ~  a*!)/15]  =  [Tn(x)  “  a]/b 


and 


T  (x)  =  -  T  (-x) 
n  —  n  — 


and  consequently  ETn(x)  =0  if  the  expectation  exists. 
While  the  variance  of  Tn  can  be  estimated  directly  by 
simulation,  we  shall  express  this  variance  in  terms  of 
known  quantities  and  of  the  expectation  of  a  conditional 
expectation  which  has  smaller  variance  and  can  therefore 
be  estimated  more  precisely.  Thus 


ET  2  =  E(E (T  2 | v,c) } 
n  n  —  — 


=  E  [E {  [x  +  sT  ( c )  ]  2  |  v ,  c  }  ] 


n  — 


=  [— ^  +  T  2(c)  ] 
n  Z  n  — 

lvi 


since 

E  [x  *  s  |  v,  c]  =  0,  E[x  |v,c]  = 

1/fv^  and 

E(s2|v,c)  =  1. 

Now 

-1 

T  (c)  =  s  [T  (x)  -  x]  tends 
n  —  n  — 

to  be  less  variable 

than 

T  (x)  and  its  variance  can 
n  — 

be  estimated  directly 

from 

N  simulations  by 

^  9  _  1  •y 

E(T  (£)  )  =  N  l  T  (c.  )  . 

j  =  l  n  J 
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Also  the  variance  of  this  estimate  is  estimated  by 

N  4  i  N  ? 

{  L  T  (£.)  "  N  I  l  Tn(£-|)  ]  >/N(N-l)  .  (6.2.1) 

j=l  J  j=l 

2-1  -1 

Finally,  for  the  normal  G,  E  [  ( Ev^  )  ]  =  n 

2 

For  the  slash  distribution,  Ev.  =  n/3  +  0(1)  and 

2.  p 

E{(Ev^2)  ^}  =  3n  ^  +  0 (n  2)  and  this  substantial 

2 

contribution  to  E [T  1  need  not  be  involved  in  the 

1  n  J 

simulation. 

6.3.  Variance  reduction  for  the  logistic  distribution. 

The  Monte  Carlo  Swindle  is  not  applicable  to  the 

logistic  distribution.  Here  we  use  another  principle. 

If  our  statistic  T  is  highly  correlated  with  another 

statistic  T'  whose  variance  is  known,  the  variance  of 

T  can  be  expressed  in  terms  of  that  of  T'  and  of  a 

2 

relatively  small  part  of  T  left  over  from  the  linear 

2  2 

approximation  to  the  regression  of  T  on  T'  . 

Let  T  and  T'  have  mean  0.  We  write, 

T2  =  aT'2  +b+u=T2+u  (6.3.1) 

-  2 

where  T  is  the  linear  approximation  to  the  regression 


and 
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a  =  Cov  (T2 ,T' 2)/var  (T'2) 


then 


E(T2)  =  E[T2-aT'2] 


+  aE  (T  *  ) 


(6.3.2) 


(6. 3. ‘3) 


and 


Var (T2 )  =  Var  (T2-aT’2)  +  a2  Var  (T'2)  (6.3.4) 

2 

Thus  if  a  and  E(T'  )  are  known,  we  can  use  the  sample 

2  2 

to  estimate  E(T  -aT'  )  which  has  relatively  small  variance 

2  2 

if  the  correlation  of  T  with  T'  is  high.  If  a  is 
unknown,  it  too  could  be  estimated  from  the  simulation. 

While  the  precise  value  of  a  is  necessary  for  (6.3.4), 
an  approximate  value  will  serve  for  (6.3.3)  which  is  the 
essential  equation  to  exploit. 

For  the  logistic  distribution  we  used  the  mean  X 

2 

for  T' .  Then  E(T'  )  =  3.28987.  As  a  simulation  results, 
we  get  a  =  .897,  .908,  .768  and  variance  reductions  are 
89.3%,  82.5%,  71.8%  where  sample  size  n  =  1000,  100,  20 
respectively. 
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Chapter  7 
Conclusions 

Simulation  studies  were  conducted  assuming  the  data 
came  from  Normal,  Logistic,  and  Slash  populations  with 
sample  sizes  20,  100  and  loOO.  The  NPS  estimate  seems 
to  be  more  efficient  compared  with  other  adaptive  estimates, 
such  as  JBT  and  tTliD,  specially  for  medium  (100)  to 
moderately  large  (1000)  sample  sizes.  We  have  shown  that 
the  NPS  estimate  of  location  has  lower  asymptotic  variance 
than  Huber's  M-estima tor  in  most  cases,  regardless  of 
Huber's  choice  of  k. 

By  a  sensitivity  curve  analysis,  we  show  that  the 
NPS  estimate  of  location  guarantees  resistance  to  outliers. 

For  the  two-sample  location  problem,  we  propose  an 
iterative  method  to  estimate  the  shift  parameter  when  the 
scale  parameters  may  be  unequal.  We  proved  that  this 
iterative  method  converges  to  the  desired  M-estimate  for 
an  arbitrary  scale  and  location  family  of  symmetric 
distributions . 

Finally  we  proposed  an  asymmetric  family  of  NPS 
distributions  which  can  be  used  to  generalize  many  of 
our  results  to  help  analyze  asymmetric  data. 
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Appendix 


***  SIMPKIM 

V  Z+SIMPKIM  DATA  ;MI  ;SI  ;GI  ;I  ;KK;AA  ;  YY  ;MED  ;TEM:NEW ;  VAL  ;  DUM 


o  MLE  CALCULATION  FOR  SYMMETRIC  NPS  DISTRIBUTION 
o  USING  SIMPLEX  METHOD 


+ERROR  IFipDATA )<10 

AA+  3  4  pO 

YY+4pO 

MI+0 . 5  ORDERSTAT  DATA 

SI+0 . 5x (o . 9  ORDERSTAT  DATA ) - ( 0 . 1  ORDERSTAT  DATA) 

GI+ 0 

DUM+( tpDATA  )*0 . 5 

o - SET  initial  4  POINTS 

AA [ ; 1 ] +MI , SI , GI 
AA  [  ;  2]  +DUM )  ,SI ,  GI 

AA [  ;  3 ] -*-MI ,  (SI+DUM)  ,GI 
AAlml+MI  ,SI ,  (GI  +  Q.l) 

-  CALCULATE  LIKELIHOOD  FUNCTION  VALUE  FOR  4  POINTS 

LOOP: 1^1 

LliYYZIl+DATA  LI  KEF  UN  jiAC;J3 
J+I+l 

-►LI  IF  1$ 4 

p - FTND  THE  POINT  WHICH  HAS  MINIMUM  LIKELIHOOD  VALUE 

AA+AA C ; &YY] 

YY+-YYZ&YY1 

MED*-  (>li4  [  ;  2  ]  +AA  [  ;  3  ]  C  ;  4  ]  )*3 


TEM+-(2*MED)-AAZ;ll 
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VAL+DATA  LIKEFUN  TEM 
+12  IF  VAKYYZ 21 
+m  IF  VAL>IYl 4] 

o . REFLECTION 

AALiU+TEM 

+L5 

o . CONTRACTION 

L2 : NEW* 0 . 5  xMED+TEM 
&'NEW+Q.  5*MED+AAZ;1]  '  IF  VAKYYIU 
VAL+DATA  LIKEFUN  NEW 
+L2  IF  VAL<YYl2) 

AAlill+NEW 

+L5 

L3  :AA[;l]«-0.5xAA[;l] +AA[{4] 

AA[;  2]«-0  . 5xAA[ ;2] +AA  [ ;4] 

AA  C ;  3  ]  +0 . 5  *AA  C  ;  3  ]  +AA  [  ;  4  ] 

+L5 

fl . EXTENSION 

L4 : NEW+ ( 2* TEM ) -MED 
VAL+DATA  LIKEFUN  NEW 
AALill+NEW 

$' AAZ;l]+TEM'  IF  VAL>YYZ 1] 

o - CHECK  FOR  STOP  ITERATIONS 

L5 :MED+ ( +/AA  )  +  4 

KK+(+/(AA[;l]-MED)*2)+(+/(AA[;2]-MED)*2) 
KK+(KK+(+/(AA[;3] -MEL )*2 )+ ( +/ (AA [ ; 4] -MED)* 2) )*0.5 
+LOOP  IF  KK> 0.001 
Z«-AA[;4] 

♦ 0 


££325 :  t>  '  Z’OO  SMALL  DATA  NUMBERS' 


r 
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+0 

V 


***  LIKEFUN  ** 


V  Z+DATA  LIKEFUN  P;L;R;M1;M2;M3\M^;PA\  DUM1 ;  DUM2  ;  DUM3  ;  DUMU  ;  MJP ;  MAD 


p 


p  LOG  LIKELIHOOD  FUNCTION  FOR  SYMMETRIC  NPS  DISTRIBUTION 
p  DATA  LIKEFUN  ( M,S,G ) 

p - 


PA-e-mrffsrAB  pc 3] 

MID+l/DATA 
MAD*? /DATA 
+L0  IF  PC  3 ] >0 

+L5  IF (,MID<PLH  -PC2]  xi-p^Cl]  -5-PC3]  ) 

+L5  JP(MD>PCl]+PC2]xi-PACl]-i-PC3]  ) 

L0:L«-PCl]-PC2] 

P«-PCl]+PC2] 

Ml  DAT  A  IF(.DATAZL ) 

M2+DATA  IF ( {DATA>L )a (DATA^R  )  ) 

M3«-D>irA  JP(P^r^>P) 

-►LI  JP(  |PC3]  )£1P‘6 

Drai-s-C  (pWl  )x®(0.  l  +  PACl]xPC2]  )  )-(l++PC3]  )  x  +  /®  (  1 +PC  3  J  x  (L-Ml  )+PC2]xp 

Cl]  ) 

+L2 

Ll :  DUMl*-(  (pMl  )x®(0.  l+P£Cl]xPC2]  ))-  +  /(  (L-Ml  )+PC2]  *PA  Cl]  ) 
L2:PPM2«-((pM2)x®+PC2]  )  ++/  (  (PA  C2]  x  (W2-PC1]  )x  (M2 -PCI]  )+PC2]  xPC2]  )+PA 
3]  ) 


f 
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+L3  IF( |PC3] )£1F"6 

DUM 3«-((pM3  )x®(o.l*PACl]xPC2] ) )-(l++PC3] )x+/® ( 1+PC3 ] x (M3 -fl )*PC 2] xPA 
Cl]  ) 

+L4 

L3  :DUM3-*-(.  (pM3  )x®(o.l*PA[l]xPC2]  ))-  +  /(  (M3  -R  )+PC2]  *PA  Cl]  ) 

L4 : Z+DUM1 +DUM2 +DUM3 
+0 

L5:Z«-"9999999999999 


•*•0 

7 

7  Z+NEWEgTAB  G-,I \F  ;A-,B\C 

fl - 

p  CALCULATE  C ,A ,B  USING  TABLE  TAB 
n  iVFJ/FSZyiB  (C) 

n - 


***  NEWEST AB  *** 


+ LOVER  IF  GZ 1.9 
+LBELOW  IF  G<~ 0.499 
F«-(G+0.5)xl000 
I«-LF 

orasc-n  +  oviBcr+i]  -tabzii  )x  cp-j) 

A«--(l+G)  +  2xC 

— A  +  (  ®C  )  +  2. 302585093 
Z-*-C,.A  , B 
->0 

LOVER :Z+ESTAB  C, "1.68605 
->0 

LBELOW : Z+ESTAB  G, "0.64184 
->0 


7 
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*** 


E£TAB  **■> 


V  Z+ESTAB  P',C;C;A;B;INT;M1;M2;M 


fi  CALCULATE  C,A,B  FOR  NPS  DISTRIBUTION 
n  (.C,A,B)+E£TAB  ( G,AI ) 


G«-PCl] 
jwj+o .  1 
4<-P[2] 

-*J4  JF  G=~l 
Ll:A+A+(-INT) ,0 ,INT 
&'A+A L". 000001'  JF  G>~ 1 
st'A^r. 000001'  JF  C<"1 
fl«-( )+®-i^-5xi+c: 

+J2  JF  O^r/A 

Ml*-(*fl)x(-3 .141 5926 54*4 )*0 .5 
M2<r(-2*A)*0 .5 
M+M1*(NDTR  M2)-(NDTR-M2) 

■+L3 

L2:M+(A,B)  INTEG ( ( -1 ) , 1 ) 

J3  :M«-|  W-0 . 8 
DUM+M- L /M 

®  1 INT+INH2  '  JF  1  =  0£/M[2] 

A+1+0CW/4 

-►Ll  JF(L/M)>1F“6 

O-  (1+G)*2x4 

B-* —  (2.302585093  +®C  )  +4 


+LAST 


92. 

L4:4«-0 
5«~0. 91629 
•■01  +  4 

LAST'.Z+C  ,A,B 
■+Q 
7 


***  NDTR  **  * 

7  P+NDTR  X;T 

a  THIS  PROGRAM  COMPUTES  TEE  AREA  UNDER  THE  CURVE  OF  THE  STANDARD  NOR 
MAL  DENSITY. 

0+1+0. 2316419*  |X 

P+  0.3193815  “0.3565638  1.781478  "1.821256  1.330274 
P«-| (X£0)-(0.398  9423x*"0.5xXxX)x(0.*i5)  +  .xP 
7 

***  INTEG  *** 

7  Z«-P  INTEG  VECl;H;Ml;M2;M3iM^iFROM;TOiSUMl;SUM2;SUM3iSUM;TEMP\OLD 
; NEW; SI ;S2 ;NUM;STEP 


p  INTEGRATION  PROGRAM  FOR  ESTAB  ( MODIFIED  SIMPSON'S  RULE ) 


XOPCl  2  3] 

XS-«-P[4  5  6] 

FROM+VEC1Z11 

T0+VEC1Z21 

Ml+TO-FROM 

NUM<-  4 

H+M 1+8 

M3+FROM+H*(l  357) 


M4«-FP0M+5x2*(i3  ) 
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Sl-«-5  0l  ( (X.A0  .  x  (M3*M3  )  )  +  (XS°  .x(4pl))) 

SUM1++/*S1 

52^50  L((X>lo.x(W4xW4))+(XSo.x(3pl))) 

SUM2++/ *S2 

SUM3<r(*50L(XA*(FROM*2))+XB)  +  (*50l  (XA*  (TO*2  )  )+XB  ) 

SUM+SUM3+(**SUM1 )+2*SUM2 
OLD+SUMxH* 3 
STEP* 0 
LOOP:H+H t2 
srpp«-srpp+i 
NUM+NUM* 2 
TEMP+SUM-SUM 1x2 
M3+FROM+Hx  ( 2 x  ( i  JVOTJ )  )  - 1 
S1+5  0L  (  (XA«.x(M3x«3  ))  +  (XB*.x(jytfMpl))) 

StfMl«-+/*Sl 

SUM-«-rEMP+4xSC/Ml 

WEJ7«-Sraxtf*3 

MC+WEJitel 

TEMP+ T / I {NEW -OLD )*TAG 
OLD+NEW 

+LOOP  IF(TEMPZ1E~8 )a(STEPZIQ ) 

Z+NEW 

7 

***  ORDERSTAT  ** 

7  Z«-P  OPPEPSTAT  X;Y  ;N 

N+  pX 

Y«-X[AX] 

Z^-YCr^xP] 
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